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This note shows that the results presented by Jabbari Nooghabi et ah (2010) do 
not hold in all expected cases. With this, the technique proposed by [Kumar and Lalhita 
(2012) for detecting upper outliers in Gamma samples is also not valid. Specifically, this 


note shows that the probability density functions (pdf) under the null hypothesis of the 
test statistics therein proposed are not always valid. 

In the aformentioned works the authors propose test statistics to detect outliers in 


Gamma samples using a test of discordancy for outliers framework as defined in [Barnett 
and Lewis ( 11994 ). 

Following the approach of Barnett and Lewis (1994), the null hypothesis {Hq) 


of a test for discordancy is a statment of an initial probability model that explains the 
data generating process. For instance, in the case here considered, Hq states that data 
are generated as independent observations from a common distribution F. If F is a 


Gamma distribution, as in Jabbari Nooghabi et ah (2010) and Kumar and Lalhita (2012), 


then Hq\ Xi,X 2 ,... are n independent random variables, each following a Gamma 
distribution with shape parameter m > 0 and scale parameter a > 0, denoted by F(m, a), 
whose probability density function (pdf) is given by 

f{x;m,a) = 




-X 


X 


exp [-) , a: > 0. 


F(m)cT’^ 

As cr is a scale parameter, without loosing generality, it will be assumed from 
now on that these random variables are distributed according to a F(m, 1) law, that is, 
with pdf given by 


f{x;m) = 


T{m 


-X 


m—1 


exp (—x), X > 0. 


The alternative hypothesis used in Jabbari Nooghabi et al. (2010) and Kumar 


and Lalhita (2012) is the slippage alternative. 


We are interested in detecting 1 < k < n upper outliers using Zk-, the statistic 


proposed by Kumar and Lalhita (2012). This statistic, after some computations, can be 
written as 


Zh — 


e;=2(^-j + i)m 


where 


y = X, 


(i) 


Xi,_ 


0 - 1 ) > 


( 1 ) 

( 2 ) 
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X(j) denotes the j-th order statistics of the ordered sample from (Xj)i<j<„ in nondecreas¬ 
ing order, that is, X(i) < X( 2 ) < ■ ■ ■ < and k is the number of observations suspected 
to be upper outliers. 

As in any statistical test, once the test statistic is proposed we need to determine 
rejection criteria related to a previously specified significance level. To do that, and to 
compute the p-value associated to a sample, the distribution of the test statistic under 
the null h ypothesis must be known. 

In Kumar and Lalhita (2012) the distribution of Zk under the null hypothesis was 


obtained based, mainly, on the distribution of differences of subsequent order statistics 
from Gamma random variables, i.e., the distribution of the Yj given in Eq. ([^. However, 
when performing simulations we observed that the empirical pdf of Z^. under the null 


hypothesis given by Kumar and Lalhita (2012) gave a proper adjustment only for m = 1, 


that is, when the random variables iXi) i<i<n follows an exponential law. 


Jabbari Nooghabi et ah (2010) also used the random variables Yj to find the 
pdf of the test statistic they proposed under the alternative (Theorem 3.1) and null 


(Corollary 3.1) hypotheses. Kumar and Lalhita (2012), followed the very same reasoning 


and methodology used in Theorem 3.1 of Jabbari Nooghabi et ah (2010) to derive the pdf 
of Zk under the null hypothesis. 

A strong assumption made in both works is that, under the null hypothesis, each 
Yj follows a r(m, {n — j + 1)”^) distribution. This is not true when m 7 ^ 1, as we show in 
what follows. 

Recall that under the null hypothesis of a test for discordancy, Xi,... ,X„ are 
independent identically distributed Gamma random variables. In general, if Xi,... ,X„ 
are independent identically distributed random variables the pdf of Ygr = X(s) — X(,.) can 
be found by solving the following integral (David and Nagaraja, 2003): 


fYsXy) 


n\ 


(r — l)!(s — r — l)!(n — s)! 


F^-\x)f{x)[F{x + y)- F{x)r-^-^f{x + 2 /)[l - F{x + y)T-^dx, (3) 


where F and / are the cumulative distribution function and the pdf, respectively, of any 
of the Xj (without sorting). 

Replacing s by j and r by j — 1 in Eq. ([^, the pdf of Yj = Xq) — X(j_i) can be 
found by solving the following integral 


fyAy) = 


n\ 


f+OO 


(j -2)!(n-j)! 


F^ ^{x)f{x)f{x + y)[l — F{x + y)Y ^dx. (4) 


Let us suppose that the sample is only composed by two random variables Xi 
and X 2 , each r(m, 1) distributed with shape parameter m G K. Then n = 2, and we just 
have to compute Y 2 = X( 2 ) — X(i). Making n = 2, and j = 2 in Eq. (|^, and having in 
mind than m G K, after some computations (see Appendix) the pdf of Y 2 can be written 
as 


frAy) = 


exp(-|/) 

r 2 (m) 


m—1 

E 

i=0 


m 


1\ r(2m — i — 

j 22 (™- 1 )-* 


1 ) 


y\y> 0. 


(5) 


As already mentioned, a strong assumption made by Jabbari Nooghabi et ah 


(2010) and by Kumar and Lalhita (2012) is that if Xi, X 2 are random variables distributed 
according to a r(m, 1) law then Y 2 ~ r(m, 1). But, if for instance m = 2 and using Eq. ([^, 
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the pdf fY 2 can be expressed as 


fY2iy) 


1 

2 


(exp(- 2 /) + y exp(-j /)),y > 0. 


( 6 ) 


This is a composition of a r(l, 1) and a r( 2 , 1) distribntions with same probability, and 


not a r( 2 , 1) distribution as claimed by both Jabbari Nooghabi et ah (2010) and by Kumar 


and Lalhita ( 2012| ). The discrepancy is notorious, as will be shown henceforth 

Algorithni presents the pseudocode used for the discussion. We implemented 
it in the R programming language R Core Team (2014), and run it with R = 10000 
replications for each case of m G {1, 3, 8}. 


Algorithm 1: Pseudocode for the analysis of Y 2 . 

Data: Read m, R, and the pseudorandom number generator seed. 
Initialize Z of length R; 

Initialize r = 1; 
for 1 < r < R do 

Obtain X = (Xi,X 2 ) from the r(m, 1) distribution; 

Sort X and obtain X(i) < W( 2 ); 

Compute Y 2 = X( 2 ) - X(i); 

Store Z(r) = Y 2 ; 

Update r = r + 1; 

Analyze Z; 


Figure [T] presents the results obtained with this simulation with R = 10^: his¬ 


tograms of I 2 and the densities proposed by Jabbari Nooghabi et al. (2010) and Kumar 
and Lalhita (|2012) (dashed lines), and the one we obtained and presented in Eq. (|^ (solid 


lines). 



(a) w~r(i,i) 


(b) W r(3,1) 


(c) W -- r(8,1) 


Figure 1: The pdf of 1^2 assumed by Jabbari Nooghabi et al. (2010) and by Kumar and 


Lalhita (2012) in dashed lines, and in solid lines the pdf given in Eq. ([^ 


Both densities coincide in the case m = 1, i.e., when Xi,X 2 follow unitary mean 
Exponential distributions; cf. Fig. 1(a) Figures 1(b) and 1(c) show the discrepancy 
between the observed data and the model claimed by Kumar and Lalhita (2012). The 
data is well £t by the distribution we obtained, though. 
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Conclusions 


In this work we have shown that if Xi, X 2 are independent random variables, each r(m, 1) 
distributed, with m G M> 2 , then the pdf of Y 2 = X( 2 ) ~^(i) is a composition of m Gamma 
distributions, and not a r(m, 1) law as claimed by Jabbari Nooghabi et ah (2010) and 


then assumed by Kumar and Lalhita (2012). Therefore, with this counterexample we 


conclude that if m G N >2 then Yj, as in Eq. (|^, does not follow a Gamma distribution. 
This implies that most computations presented by Jabbari Nooghabi et ah (2010) and 


Lalhita (2012). 


by Kumar and Lalhita (2012) are not valid, including the pdf of Zk given by 


Kumar and 


Appendix 

From Eq. (Q 

fr^iy) = 


2 ! 


“+00 


(2-2)!(2-2)!i„ 

|^+oo 

2 / f{x)f{x + y)dx 
Jo 
2 


F ^{x)f{x)f{x + y)[l-F{x + y)Y ^dx 




r2(m) J, 

2 exp(-|/) 
T\m) io 


exp (—^ exp{—{x + y)){x + y)'^ ^dx 
exp{—2x){x‘^ + xy)^~^dx. 


Having in mind that m G N, expanding the binomial {x"^ + xy)'^ ^ and using that 


Jg “ x^e °'^dx = a + 1), follows that 

2exp{-y) [ /■+“ 

2exp{-y) I^Vm-l^ . 


' m-l , ^ ■ 

^ V * 


dx 


r 2 (m) 

exp(-i/) 

r 2 (m) 


i=0 


f + OO 


X 




exp{—2x)dx 


m—1 

E 

.1=0 


m 


1 \ r( 2 m — i — 1 ) ■ 


J 22( 


m—1)—2 


-y 


Incidentally, the expression given for the Dixon’s Dk statistic by both the articles 
commented in this work are wrong. They state that = (-E(n) ~ X{n-k))/X(n) when, in 
fact, it is 


Dh = 


J^(n) ^{n—k) 


X(n) - 2 ^( 1 ) 


the ratio of the gap to the range of the data. 
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